Dataset statistics
| Number of variables | 16 |
|---|---|
| Number of observations | 253155 |
| Missing cells | 53018 |
| Missing cells (%) | 1.3% |
| Duplicate rows | 6 |
| Duplicate rows (%) | < 0.1% |
| Total size in memory | 30.9 MiB |
| Average record size in memory | 128.0 B |
Variable types
| Numeric | 9 |
|---|---|
| Categorical | 7 |
| Dataset has 6 (< 0.1%) duplicate rows | Duplicates |
Store_name has a high cardinality: 17738 distinct values | High cardinality |
cate2 has a high cardinality: 74 distinct values | High cardinality |
ID has a high cardinality: 1727 distinct values | High cardinality |
jeju_person_sales is highly correlated with jeju_person_sales_num and 2 other fields | High correlation |
jeju_person_sales_num is highly correlated with jeju_person_sales and 1 other fields | High correlation |
other_person_sales is highly correlated with other_person_sales_num and 3 other fields | High correlation |
other_person_sales_num is highly correlated with other_person_sales and 3 other fields | High correlation |
tot_sales is highly correlated with other_person_sales and 3 other fields | High correlation |
tot_sales_num is highly correlated with jeju_person_sales and 5 other fields | High correlation |
change is highly correlated with jeju_person_sales and 5 other fields | High correlation |
ranking is highly correlated with Date and 1 other fields | High correlation |
si_gune_gu is highly correlated with Dong and 1 other fields | High correlation |
Dong is highly correlated with si_gune_gu and 1 other fields | High correlation |
loc is highly correlated with si_gune_gu and 1 other fields | High correlation |
cate1 is highly correlated with cate2 | High correlation |
cate2 is highly correlated with cate1 | High correlation |
Date is highly correlated with ranking | High correlation |
change has 26509 (10.5%) missing values | Missing |
ranking has 26509 (10.5%) missing values | Missing |
other_person_sales_num is highly skewed (γ1 = 20.90775832) | Skewed |
jeju_person_sales has 12767 (5.0%) zeros | Zeros |
jeju_person_sales_num has 11661 (4.6%) zeros | Zeros |
other_person_sales has 82421 (32.6%) zeros | Zeros |
other_person_sales_num has 27697 (10.9%) zeros | Zeros |
tot_sales has 21173 (8.4%) zeros | Zeros |
change has 10216 (4.0%) zeros | Zeros |
Reproduction
| Analysis started | 2022-11-26 01:19:15.254168 |
|---|---|
| Analysis finished | 2022-11-26 01:19:50.387404 |
| Duration | 35.13 seconds |
| Software version | pandas-profiling v3.4.0 |
| Download configuration | config.json |
| Distinct | 18 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 202144.7132 |
| Minimum | 202101 |
|---|---|
| Maximum | 202207 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.9 MiB |
Quantile statistics
| Minimum | 202101 |
|---|---|
| 5-th percentile | 202101 |
| Q1 | 202106 |
| median | 202111 |
| Q3 | 202203 |
| 95-th percentile | 202207 |
| Maximum | 202207 |
| Range | 106 |
| Interquartile range (IQR) | 97 |
Descriptive statistics
| Standard deviation | 47.35496332 |
|---|---|
| Coefficient of variation (CV) | 0.0002342626852 |
| Kurtosis | -1.78304966 |
| Mean | 202144.7132 |
| Median Absolute Deviation (MAD) | 7 |
| Skewness | 0.449001644 |
| Sum | 5.117394488 × 1010 |
| Variance | 2242.492551 |
| Monotonicity | Increasing |
Histogram with fixed size bins (bins=18)
| Value | Count | Frequency (%) |
| 202111 | 14585 | 5.8% |
| 202110 | 14514 | 5.7% |
| 202112 | 14514 | 5.7% |
| 202201 | 14332 | 5.7% |
| 202107 | 14272 | 5.6% |
| 202203 | 14226 | 5.6% |
| 202204 | 14189 | 5.6% |
| 202108 | 14122 | 5.6% |
| 202109 | 14117 | 5.6% |
| 202202 | 14117 | 5.6% |
| Other values (8) | 110167 |
| Value | Count | Frequency (%) |
| 202101 | 12757 | |
| 202103 | 13733 | |
| 202104 | 13941 | |
| 202105 | 14103 | |
| 202106 | 14055 | |
| 202107 | 14272 | |
| 202108 | 14122 | |
| 202109 | 14117 | |
| 202110 | 14514 | |
| 202111 | 14585 |
| Value | Count | Frequency (%) |
| 202207 | 13581 | |
| 202206 | 13883 | |
| 202205 | 14114 | |
| 202204 | 14189 | |
| 202203 | 14226 | |
| 202202 | 14117 | |
| 202201 | 14332 | |
| 202112 | 14514 | |
| 202111 | 14585 | |
| 202110 | 14514 |
| Distinct | 17738 |
|---|---|
| Distinct (%) | 7.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.9 MiB |
| 으뜸아이스크림할인점 | 57 |
|---|---|
| 한라식당 | 54 |
| 30년할매닭발 | 54 |
| 고사리식당 | 53 |
| 봉봉 | 40 |
| Other values (17733) |
Length
| Max length | 38 |
|---|---|
| Median length | 30 |
| Mean length | 5.990938358 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1516636 |
|---|---|
| Distinct characters | 1236 |
| Distinct categories | 10 ? |
| Distinct scripts | 4 ? |
| Distinct blocks | 3 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 429 ? |
|---|---|
| Unique (%) | 0.2% |
Sample
| 1st row | 한국맥도날드(유)제주노형점 |
|---|---|
| 2nd row | 버거킹제주이마트점 |
| 3rd row | 투썸플레이스 제주노형오거리점 |
| 4th row | 뚜레쥬르 노형오거리점 |
| 5th row | 에이바우트커피아이파크점 |
Common Values
| Value | Count | Frequency (%) |
| 으뜸아이스크림할인점 | 57 | < 0.1% |
| 한라식당 | 54 | < 0.1% |
| 30년할매닭발 | 54 | < 0.1% |
| 고사리식당 | 53 | < 0.1% |
| 봉봉 | 40 | < 0.1% |
| 가람 | 39 | < 0.1% |
| 통큰코다리 | 37 | < 0.1% |
| 봄봄 | 37 | < 0.1% |
| 왕천파닭 | 37 | < 0.1% |
| 쭈꾸쭈꾸쭈꾸미 | 36 | < 0.1% |
| Other values (17728) | 252711 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 주식회사 | 2889 | 1.0% |
| 노형점 | 601 | 0.2% |
| 서귀포점 | 538 | 0.2% |
| 제주 | 412 | 0.1% |
| 연동점 | 408 | 0.1% |
| 제주점 | 404 | 0.1% |
| 중문점 | 380 | 0.1% |
| 삼화점 | 365 | 0.1% |
| 아라점 | 364 | 0.1% |
| 신제주점 | 364 | 0.1% |
| Other values (18033) | 276534 |
Most occurring characters
| Value | Count | Frequency (%) |
| 점 | 55973 | 3.7% |
| 주 | 36073 | 2.4% |
| 이 | 34891 | 2.3% |
| 31261 | 2.1% | |
| 제 | 30861 | 2.0% |
| 식 | 23383 | 1.5% |
| 리 | 21204 | 1.4% |
| 스 | 19757 | 1.3% |
| 당 | 18178 | 1.2% |
| 도 | 17420 | 1.1% |
| Other values (1226) | 1227635 |
Most occurring categories
| Value | Count | Frequency (%) |
| Other Letter | 1432016 | |
| Space Separator | 31261 | 2.1% |
| Decimal Number | 18373 | 1.2% |
| Uppercase Letter | 10686 | 0.7% |
| Lowercase Letter | 10554 | 0.7% |
| Close Punctuation | 5906 | 0.4% |
| Open Punctuation | 5903 | 0.4% |
| Other Punctuation | 1754 | 0.1% |
| Dash Punctuation | 149 | < 0.1% |
| Connector Punctuation | 34 | < 0.1% |
Most frequent character per category
Other Letter
| Value | Count | Frequency (%) |
| 점 | 55973 | 3.9% |
| 주 | 36073 | 2.5% |
| 이 | 34891 | 2.4% |
| 제 | 30861 | 2.2% |
| 식 | 23383 | 1.6% |
| 리 | 21204 | 1.5% |
| 스 | 19757 | 1.4% |
| 당 | 18178 | 1.3% |
| 도 | 17420 | 1.2% |
| 가 | 15722 | 1.1% |
| Other values (1150) | 1158554 |
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 1387 | |
| o | 1319 | |
| a | 1130 | 10.7% |
| r | 621 | 5.9% |
| i | 615 | 5.8% |
| t | 522 | 4.9% |
| f | 470 | 4.5% |
| n | 447 | 4.2% |
| s | 444 | 4.2% |
| u | 412 | 3.9% |
| Other values (16) | 3187 |
Uppercase Letter
| Value | Count | Frequency (%) |
| A | 979 | 9.2% |
| O | 807 | 7.6% |
| E | 738 | 6.9% |
| T | 684 | 6.4% |
| B | 638 | 6.0% |
| D | 552 | 5.2% |
| N | 547 | 5.1% |
| S | 513 | 4.8% |
| L | 495 | 4.6% |
| C | 494 | 4.6% |
| Other values (16) | 4239 |
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 3521 | |
| 1 | 3360 | |
| 0 | 2618 | |
| 3 | 1970 | |
| 9 | 1380 | 7.5% |
| 4 | 1337 | 7.3% |
| 7 | 1239 | 6.7% |
| 5 | 1143 | 6.2% |
| 8 | 1084 | 5.9% |
| 6 | 721 | 3.9% |
Other Punctuation
| Value | Count | Frequency (%) |
| & | 662 | |
| . | 661 | |
| , | 173 | 9.9% |
| ? | 84 | 4.8% |
| / | 71 | 4.0% |
| ' | 63 | 3.6% |
| : | 18 | 1.0% |
| ! | 11 | 0.6% |
| ; | 11 | 0.6% |
Space Separator
| Value | Count | Frequency (%) |
| 31261 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 5906 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 5903 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 149 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 34 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Hangul | 1431917 | |
| Common | 63380 | 4.2% |
| Latin | 21240 | 1.4% |
| Han | 99 | < 0.1% |
Most frequent character per script
Hangul
| Value | Count | Frequency (%) |
| 점 | 55973 | 3.9% |
| 주 | 36073 | 2.5% |
| 이 | 34891 | 2.4% |
| 제 | 30861 | 2.2% |
| 식 | 23383 | 1.6% |
| 리 | 21204 | 1.5% |
| 스 | 19757 | 1.4% |
| 당 | 18178 | 1.3% |
| 도 | 17420 | 1.2% |
| 가 | 15722 | 1.1% |
| Other values (1145) | 1158455 |
Latin
| Value | Count | Frequency (%) |
| e | 1387 | 6.5% |
| o | 1319 | 6.2% |
| a | 1130 | 5.3% |
| A | 979 | 4.6% |
| O | 807 | 3.8% |
| E | 738 | 3.5% |
| T | 684 | 3.2% |
| B | 638 | 3.0% |
| r | 621 | 2.9% |
| i | 615 | 2.9% |
| Other values (42) | 12322 |
Common
| Value | Count | Frequency (%) |
| 31261 | ||
| ) | 5906 | 9.3% |
| ( | 5903 | 9.3% |
| 2 | 3521 | 5.6% |
| 1 | 3360 | 5.3% |
| 0 | 2618 | 4.1% |
| 3 | 1970 | 3.1% |
| 9 | 1380 | 2.2% |
| 4 | 1337 | 2.1% |
| 7 | 1239 | 2.0% |
| Other values (14) | 4885 | 7.7% |
Han
| Value | Count | Frequency (%) |
| 日 | 36 | |
| 德 | 18 | |
| 人 | 15 | |
| 宜 | 15 | |
| 愛 | 15 |
Most occurring blocks
| Value | Count | Frequency (%) |
| Hangul | 1431917 | |
| ASCII | 84620 | 5.6% |
| CJK | 99 | < 0.1% |
Most frequent character per block
Hangul
| Value | Count | Frequency (%) |
| 점 | 55973 | 3.9% |
| 주 | 36073 | 2.5% |
| 이 | 34891 | 2.4% |
| 제 | 30861 | 2.2% |
| 식 | 23383 | 1.6% |
| 리 | 21204 | 1.5% |
| 스 | 19757 | 1.4% |
| 당 | 18178 | 1.3% |
| 도 | 17420 | 1.2% |
| 가 | 15722 | 1.1% |
| Other values (1145) | 1158455 |
ASCII
| Value | Count | Frequency (%) |
| 31261 | ||
| ) | 5906 | 7.0% |
| ( | 5903 | 7.0% |
| 2 | 3521 | 4.2% |
| 1 | 3360 | 4.0% |
| 0 | 2618 | 3.1% |
| 3 | 1970 | 2.3% |
| e | 1387 | 1.6% |
| 9 | 1380 | 1.6% |
| 4 | 1337 | 1.6% |
| Other values (66) | 25977 |
CJK
| Value | Count | Frequency (%) |
| 日 | 36 | |
| 德 | 18 | |
| 人 | 15 | |
| 宜 | 15 | |
| 愛 | 15 |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.9 MiB |
| 제주시 | |
|---|---|
| 서귀포시 |
Length
| Max length | 4 |
|---|---|
| Median length | 3 |
| Mean length | 3.305769193 |
| Min length | 3 |
Characters and Unicode
| Total characters | 836872 |
|---|---|
| Distinct characters | 6 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 제주시 |
|---|---|
| 2nd row | 제주시 |
| 3rd row | 제주시 |
| 4th row | 제주시 |
| 5th row | 제주시 |
Common Values
| Value | Count | Frequency (%) |
| 제주시 | 175748 | |
| 서귀포시 | 77407 |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 제주시 | 175748 | |
| 서귀포시 | 77407 |
Most occurring characters
| Value | Count | Frequency (%) |
| 시 | 253155 | |
| 제 | 175748 | |
| 주 | 175748 | |
| 서 | 77407 | 9.2% |
| 귀 | 77407 | 9.2% |
| 포 | 77407 | 9.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| Other Letter | 836872 |
Most frequent character per category
Other Letter
| Value | Count | Frequency (%) |
| 시 | 253155 | |
| 제 | 175748 | |
| 주 | 175748 | |
| 서 | 77407 | 9.2% |
| 귀 | 77407 | 9.2% |
| 포 | 77407 | 9.2% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Hangul | 836872 |
Most frequent character per script
Hangul
| Value | Count | Frequency (%) |
| 시 | 253155 | |
| 제 | 175748 | |
| 주 | 175748 | |
| 서 | 77407 | 9.2% |
| 귀 | 77407 | 9.2% |
| 포 | 77407 | 9.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| Hangul | 836872 |
Most frequent character per block
Hangul
| Value | Count | Frequency (%) |
| 시 | 253155 | |
| 제 | 175748 | |
| 주 | 175748 | |
| 서 | 77407 | 9.2% |
| 귀 | 77407 | 9.2% |
| 포 | 77407 | 9.2% |
| Distinct | 43 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.9 MiB |
| 이도2동 | |
|---|---|
| 연동 | |
| 노형동 | |
| 애월읍 | 13191 |
| 한림읍 | 10892 |
| Other values (38) |
Length
| Max length | 4 |
|---|---|
| Median length | 3 |
| Mean length | 3.122889139 |
| Min length | 2 |
Characters and Unicode
| Total characters | 790575 |
|---|---|
| Distinct characters | 63 |
| Distinct categories | 2 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 2 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 노형동 |
|---|---|
| 2nd row | 노형동 |
| 3rd row | 노형동 |
| 4th row | 노형동 |
| 5th row | 노형동 |
Common Values
| Value | Count | Frequency (%) |
| 이도2동 | 21668 | 8.6% |
| 연동 | 20889 | 8.3% |
| 노형동 | 18107 | 7.2% |
| 애월읍 | 13191 | 5.2% |
| 한림읍 | 10892 | 4.3% |
| 구좌읍 | 10158 | 4.0% |
| 조천읍 | 10150 | 4.0% |
| 성산읍 | 9246 | 3.7% |
| 아라동 | 9045 | 3.6% |
| 일도2동 | 8555 | 3.4% |
| Other values (33) | 121254 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 이도2동 | 21668 | 8.6% |
| 연동 | 20889 | 8.3% |
| 노형동 | 18107 | 7.2% |
| 애월읍 | 13191 | 5.2% |
| 한림읍 | 10892 | 4.3% |
| 구좌읍 | 10158 | 4.0% |
| 조천읍 | 10150 | 4.0% |
| 성산읍 | 9246 | 3.7% |
| 아라동 | 9045 | 3.6% |
| 일도2동 | 8555 | 3.4% |
| Other values (33) | 121254 |
Most occurring characters
| Value | Count | Frequency (%) |
| 동 | 172113 | |
| 읍 | 67248 | 8.5% |
| 도 | 54273 | 6.9% |
| 2 | 38717 | 4.9% |
| 이 | 26516 | 3.4% |
| 연 | 20889 | 2.6% |
| 면 | 19355 | 2.4% |
| 천 | 18810 | 2.4% |
| 노 | 18107 | 2.3% |
| 형 | 18107 | 2.3% |
| Other values (53) | 336440 |
Most occurring categories
| Value | Count | Frequency (%) |
| Other Letter | 738576 | |
| Decimal Number | 51999 | 6.6% |
Most frequent character per category
Other Letter
| Value | Count | Frequency (%) |
| 동 | 172113 | |
| 읍 | 67248 | 9.1% |
| 도 | 54273 | 7.3% |
| 이 | 26516 | 3.6% |
| 연 | 20889 | 2.8% |
| 면 | 19355 | 2.6% |
| 천 | 18810 | 2.5% |
| 노 | 18107 | 2.5% |
| 형 | 18107 | 2.5% |
| 대 | 17060 | 2.3% |
| Other values (51) | 306098 |
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 38717 | |
| 1 | 13282 | 25.5% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Hangul | 738576 | |
| Common | 51999 | 6.6% |
Most frequent character per script
Hangul
| Value | Count | Frequency (%) |
| 동 | 172113 | |
| 읍 | 67248 | 9.1% |
| 도 | 54273 | 7.3% |
| 이 | 26516 | 3.6% |
| 연 | 20889 | 2.8% |
| 면 | 19355 | 2.6% |
| 천 | 18810 | 2.5% |
| 노 | 18107 | 2.5% |
| 형 | 18107 | 2.5% |
| 대 | 17060 | 2.3% |
| Other values (51) | 306098 |
Common
| Value | Count | Frequency (%) |
| 2 | 38717 | |
| 1 | 13282 | 25.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| Hangul | 738576 | |
| ASCII | 51999 | 6.6% |
Most frequent character per block
Hangul
| Value | Count | Frequency (%) |
| 동 | 172113 | |
| 읍 | 67248 | 9.1% |
| 도 | 54273 | 7.3% |
| 이 | 26516 | 3.6% |
| 연 | 20889 | 2.8% |
| 면 | 19355 | 2.6% |
| 천 | 18810 | 2.5% |
| 노 | 18107 | 2.5% |
| 형 | 18107 | 2.5% |
| 대 | 17060 | 2.3% |
| Other values (51) | 306098 |
ASCII
| Value | Count | Frequency (%) |
| 2 | 38717 | |
| 1 | 13282 | 25.5% |
| Distinct | 16 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.9 MiB |
| 제주시 동지역 | |
|---|---|
| 서귀포시 동지역 | |
| 애월읍 | |
| 한림읍 | 10892 |
| 구좌읍 | 10158 |
| Other values (11) |
Length
| Max length | 8 |
|---|---|
| Median length | 7 |
| Mean length | 5.785747862 |
| Min length | 3 |
Characters and Unicode
| Total characters | 1464691 |
|---|---|
| Distinct characters | 39 |
| Distinct categories | 2 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 2 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 제주시 동지역 |
|---|---|
| 2nd row | 제주시 동지역 |
| 3rd row | 제주시 동지역 |
| 4th row | 제주시 동지역 |
| 5th row | 제주시 동지역 |
Common Values
| Value | Count | Frequency (%) |
| 제주시 동지역 | 124484 | |
| 서귀포시 동지역 | 41458 | 16.4% |
| 애월읍 | 13191 | 5.2% |
| 한림읍 | 10892 | 4.3% |
| 구좌읍 | 10158 | 4.0% |
| 조천읍 | 10150 | 4.0% |
| 성산읍 | 9246 | 3.7% |
| 대정읍 | 8322 | 3.3% |
| 안덕면 | 6974 | 2.8% |
| 표선면 | 6118 | 2.4% |
| Other values (6) | 12162 | 4.8% |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 동지역 | 165942 | |
| 제주시 | 124484 | |
| 서귀포시 | 41458 | 9.9% |
| 애월읍 | 13191 | 3.1% |
| 한림읍 | 10892 | 2.6% |
| 구좌읍 | 10158 | 2.4% |
| 조천읍 | 10150 | 2.4% |
| 성산읍 | 9246 | 2.2% |
| 대정읍 | 8322 | 2.0% |
| 안덕면 | 6974 | 1.7% |
| Other values (7) | 18280 | 4.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| 동 | 166552 | |
| 역 | 165942 | |
| 시 | 165942 | |
| 165942 | ||
| 지 | 165942 | |
| 제 | 124484 | |
| 주 | 124484 | |
| 읍 | 67248 | 4.6% |
| 서 | 41458 | 2.8% |
| 귀 | 41458 | 2.8% |
| Other values (29) | 235239 |
Most occurring categories
| Value | Count | Frequency (%) |
| Other Letter | 1298749 | |
| Space Separator | 165942 | 11.3% |
Most frequent character per category
Other Letter
| Value | Count | Frequency (%) |
| 동 | 166552 | |
| 역 | 165942 | |
| 시 | 165942 | |
| 지 | 165942 | |
| 제 | 124484 | |
| 주 | 124484 | |
| 읍 | 67248 | 5.2% |
| 서 | 41458 | 3.2% |
| 귀 | 41458 | 3.2% |
| 포 | 41458 | 3.2% |
| Other values (28) | 193781 |
Space Separator
| Value | Count | Frequency (%) |
| 165942 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Hangul | 1298749 | |
| Common | 165942 | 11.3% |
Most frequent character per script
Hangul
| Value | Count | Frequency (%) |
| 동 | 166552 | |
| 역 | 165942 | |
| 시 | 165942 | |
| 지 | 165942 | |
| 제 | 124484 | |
| 주 | 124484 | |
| 읍 | 67248 | 5.2% |
| 서 | 41458 | 3.2% |
| 귀 | 41458 | 3.2% |
| 포 | 41458 | 3.2% |
| Other values (28) | 193781 |
Common
| Value | Count | Frequency (%) |
| 165942 |
Most occurring blocks
| Value | Count | Frequency (%) |
| Hangul | 1298749 | |
| ASCII | 165942 | 11.3% |
Most frequent character per block
Hangul
| Value | Count | Frequency (%) |
| 동 | 166552 | |
| 역 | 165942 | |
| 시 | 165942 | |
| 지 | 165942 | |
| 제 | 124484 | |
| 주 | 124484 | |
| 읍 | 67248 | 5.2% |
| 서 | 41458 | 3.2% |
| 귀 | 41458 | 3.2% |
| 포 | 41458 | 3.2% |
| Other values (28) | 193781 |
ASCII
| Value | Count | Frequency (%) |
| 165942 |
| Distinct | 9 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.9 MiB |
| 한식 | |
|---|---|
| 간식 | 16419 |
| 음료 | 14677 |
| 아시아음식 | 13909 |
| 패스트푸드 | 13776 |
| Other values (4) | 13627 |
Length
| Max length | 9 |
|---|---|
| Median length | 2 |
| Mean length | 2.447346487 |
| Min length | 2 |
Characters and Unicode
| Total characters | 619558 |
|---|---|
| Distinct characters | 22 |
| Distinct categories | 2 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 2 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 패스트푸드 |
|---|---|
| 2nd row | 패스트푸드 |
| 3rd row | 음료 |
| 4th row | 간식 |
| 5th row | 음료 |
Common Values
| Value | Count | Frequency (%) |
| 한식 | 180747 | |
| 간식 | 16419 | 6.5% |
| 음료 | 14677 | 5.8% |
| 아시아음식 | 13909 | 5.5% |
| 패스트푸드 | 13776 | 5.4% |
| 양식 | 7754 | 3.1% |
| 주점및주류판매 | 4752 | 1.9% |
| 주점 및 주류판매 | 919 | 0.4% |
| 부페 | 202 | 0.1% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 한식 | 180747 | |
| 간식 | 16419 | 6.4% |
| 음료 | 14677 | 5.8% |
| 아시아음식 | 13909 | 5.5% |
| 패스트푸드 | 13776 | 5.4% |
| 양식 | 7754 | 3.0% |
| 주점및주류판매 | 4752 | 1.9% |
| 주점 | 919 | 0.4% |
| 및 | 919 | 0.4% |
| 주류판매 | 919 | 0.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| 식 | 218829 | |
| 한 | 180747 | |
| 음 | 28586 | 4.6% |
| 아 | 27818 | 4.5% |
| 간 | 16419 | 2.7% |
| 료 | 14677 | 2.4% |
| 시 | 13909 | 2.2% |
| 트 | 13776 | 2.2% |
| 푸 | 13776 | 2.2% |
| 드 | 13776 | 2.2% |
| Other values (12) | 77245 | 12.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| Other Letter | 617720 | |
| Space Separator | 1838 | 0.3% |
Most frequent character per category
Other Letter
| Value | Count | Frequency (%) |
| 식 | 218829 | |
| 한 | 180747 | |
| 음 | 28586 | 4.6% |
| 아 | 27818 | 4.5% |
| 간 | 16419 | 2.7% |
| 료 | 14677 | 2.4% |
| 시 | 13909 | 2.3% |
| 트 | 13776 | 2.2% |
| 푸 | 13776 | 2.2% |
| 드 | 13776 | 2.2% |
| Other values (11) | 75407 | 12.2% |
Space Separator
| Value | Count | Frequency (%) |
| 1838 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Hangul | 617720 | |
| Common | 1838 | 0.3% |
Most frequent character per script
Hangul
| Value | Count | Frequency (%) |
| 식 | 218829 | |
| 한 | 180747 | |
| 음 | 28586 | 4.6% |
| 아 | 27818 | 4.5% |
| 간 | 16419 | 2.7% |
| 료 | 14677 | 2.4% |
| 시 | 13909 | 2.3% |
| 트 | 13776 | 2.2% |
| 푸 | 13776 | 2.2% |
| 드 | 13776 | 2.2% |
| Other values (11) | 75407 | 12.2% |
Common
| Value | Count | Frequency (%) |
| 1838 |
Most occurring blocks
| Value | Count | Frequency (%) |
| Hangul | 617720 | |
| ASCII | 1838 | 0.3% |
Most frequent character per block
Hangul
| Value | Count | Frequency (%) |
| 식 | 218829 | |
| 한 | 180747 | |
| 음 | 28586 | 4.6% |
| 아 | 27818 | 4.5% |
| 간 | 16419 | 2.7% |
| 료 | 14677 | 2.4% |
| 시 | 13909 | 2.3% |
| 트 | 13776 | 2.2% |
| 푸 | 13776 | 2.2% |
| 드 | 13776 | 2.2% |
| Other values (11) | 75407 | 12.2% |
ASCII
| Value | Count | Frequency (%) |
| 1838 |
| Distinct | 74 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.9 MiB |
| 가정식 | |
|---|---|
| 단품요리 전문 | |
| 커피 | |
| 치킨 | |
| 베이커리 | |
| Other values (69) |
Length
| Max length | 9 |
|---|---|
| Median length | 8 |
| Mean length | 4.035432838 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1021590 |
|---|---|
| Distinct characters | 134 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 2 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 햄버거 |
|---|---|
| 2nd row | 햄버거 |
| 3rd row | 커피 |
| 4th row | 베이커리 |
| 5th row | 커피 |
Common Values
| Value | Count | Frequency (%) |
| 가정식 | 74816 | |
| 단품요리 전문 | 73696 | |
| 커피 | 17289 | 6.8% |
| 치킨 | 11185 | 4.4% |
| 베이커리 | 7493 | 3.0% |
| 중식 | 6938 | 2.7% |
| 양식 | 6782 | 2.7% |
| 일식 | 6234 | 2.5% |
| 분식 | 5960 | 2.4% |
| 돼지고기 | 4939 | 2.0% |
| Other values (64) | 37823 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 가정식 | 74816 | |
| 단품요리 | 73696 | |
| 전문 | 73696 | |
| 커피 | 17289 | 5.3% |
| 치킨 | 11185 | 3.4% |
| 베이커리 | 7493 | 2.3% |
| 중식 | 6938 | 2.1% |
| 양식 | 6782 | 2.1% |
| 일식 | 6234 | 1.9% |
| 분식 | 5960 | 1.8% |
| Other values (66) | 43035 |
Most occurring characters
| Value | Count | Frequency (%) |
| 식 | 101689 | |
| 리 | 84466 | 8.3% |
| 요 | 76298 | 7.5% |
| 가 | 74816 | 7.3% |
| 정 | 74816 | 7.3% |
| 73969 | 7.2% | |
| 단 | 73696 | 7.2% |
| 품 | 73696 | 7.2% |
| 전 | 73696 | 7.2% |
| 문 | 73696 | 7.2% |
| Other values (124) | 240752 |
Most occurring categories
| Value | Count | Frequency (%) |
| Other Letter | 940875 | |
| Space Separator | 73969 | 7.2% |
| Other Punctuation | 6746 | 0.7% |
Most frequent character per category
Other Letter
| Value | Count | Frequency (%) |
| 식 | 101689 | |
| 리 | 84466 | |
| 요 | 76298 | 8.1% |
| 가 | 74816 | 8.0% |
| 정 | 74816 | 8.0% |
| 단 | 73696 | 7.8% |
| 품 | 73696 | 7.8% |
| 전 | 73696 | 7.8% |
| 문 | 73696 | 7.8% |
| 커 | 24782 | 2.6% |
| Other values (122) | 209224 |
Space Separator
| Value | Count | Frequency (%) |
| 73969 |
Other Punctuation
| Value | Count | Frequency (%) |
| / | 6746 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Hangul | 940875 | |
| Common | 80715 | 7.9% |
Most frequent character per script
Hangul
| Value | Count | Frequency (%) |
| 식 | 101689 | |
| 리 | 84466 | |
| 요 | 76298 | 8.1% |
| 가 | 74816 | 8.0% |
| 정 | 74816 | 8.0% |
| 단 | 73696 | 7.8% |
| 품 | 73696 | 7.8% |
| 전 | 73696 | 7.8% |
| 문 | 73696 | 7.8% |
| 커 | 24782 | 2.6% |
| Other values (122) | 209224 |
Common
| Value | Count | Frequency (%) |
| 73969 | ||
| / | 6746 | 8.4% |
Most occurring blocks
| Value | Count | Frequency (%) |
| Hangul | 940875 | |
| ASCII | 80715 | 7.9% |
Most frequent character per block
Hangul
| Value | Count | Frequency (%) |
| 식 | 101689 | |
| 리 | 84466 | |
| 요 | 76298 | 8.1% |
| 가 | 74816 | 8.0% |
| 정 | 74816 | 8.0% |
| 단 | 73696 | 7.8% |
| 품 | 73696 | 7.8% |
| 전 | 73696 | 7.8% |
| 문 | 73696 | 7.8% |
| 커 | 24782 | 2.6% |
| Other values (122) | 209224 |
ASCII
| Value | Count | Frequency (%) |
| 73969 | ||
| / | 6746 | 8.4% |
| Distinct | 1387 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.6848708104 |
| Minimum | 0 |
|---|---|
| Maximum | 46.16 |
| Zeros | 12767 |
| Zeros (%) | 5.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.9 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0.11 |
| median | 0.33 |
| Q3 | 0.8 |
| 95-th percentile | 2.46 |
| Maximum | 46.16 |
| Range | 46.16 |
| Interquartile range (IQR) | 0.69 |
Descriptive statistics
| Standard deviation | 1.224056655 |
|---|---|
| Coefficient of variation (CV) | 1.7872811 |
| Kurtosis | 156.8893962 |
| Mean | 0.6848708104 |
| Median Absolute Deviation (MAD) | 0.27 |
| Skewness | 8.774687411 |
| Sum | 173378.47 |
| Variance | 1.498314696 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 12767 | 5.0% |
| 0.03 | 5471 | 2.2% |
| 0.02 | 5427 | 2.1% |
| 0.04 | 5305 | 2.1% |
| 0.06 | 5006 | 2.0% |
| 0.05 | 4930 | 1.9% |
| 0.01 | 4784 | 1.9% |
| 0.07 | 4703 | 1.9% |
| 0.08 | 4578 | 1.8% |
| 0.09 | 4269 | 1.7% |
| Other values (1377) | 195915 |
| Value | Count | Frequency (%) |
| 0 | 12767 | |
| 0.01 | 4784 | 1.9% |
| 0.02 | 5427 | |
| 0.03 | 5471 | |
| 0.04 | 5305 | |
| 0.05 | 4930 | 1.9% |
| 0.06 | 5006 | 2.0% |
| 0.07 | 4703 | 1.9% |
| 0.08 | 4578 | 1.8% |
| 0.09 | 4269 | 1.7% |
| Value | Count | Frequency (%) |
| 46.16 | 1 | |
| 42.63 | 1 | |
| 42 | 1 | |
| 41.87 | 1 | |
| 41.66 | 1 | |
| 41.56 | 1 | |
| 39.23 | 1 | |
| 38.17 | 1 | |
| 37.99 | 1 | |
| 37.24 | 1 |
| Distinct | 1932 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.8912782288 |
| Minimum | 0 |
|---|---|
| Maximum | 102.4 |
| Zeros | 11661 |
| Zeros (%) | 4.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.9 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.02 |
| Q1 | 0.11 |
| median | 0.34 |
| Q3 | 0.87 |
| 95-th percentile | 3.39 |
| Maximum | 102.4 |
| Range | 102.4 |
| Interquartile range (IQR) | 0.76 |
Descriptive statistics
| Standard deviation | 2.316831634 |
|---|---|
| Coefficient of variation (CV) | 2.599448252 |
| Kurtosis | 639.4458283 |
| Mean | 0.8912782288 |
| Median Absolute Deviation (MAD) | 0.28 |
| Skewness | 18.68913712 |
| Sum | 225631.54 |
| Variance | 5.367708821 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.03 | 14703 | 5.8% |
| 0 | 11661 | 4.6% |
| 0.05 | 7845 | 3.1% |
| 0.06 | 6429 | 2.5% |
| 0.08 | 6328 | 2.5% |
| 0.11 | 5983 | 2.4% |
| 0.13 | 5896 | 2.3% |
| 0.09 | 4917 | 1.9% |
| 0.16 | 4422 | 1.7% |
| 0.14 | 4244 | 1.7% |
| Other values (1922) | 180727 |
| Value | Count | Frequency (%) |
| 0 | 11661 | |
| 0.01 | 147 | 0.1% |
| 0.02 | 1753 | 0.7% |
| 0.03 | 14703 | |
| 0.04 | 111 | < 0.1% |
| 0.05 | 7845 | |
| 0.06 | 6429 | |
| 0.07 | 682 | 0.3% |
| 0.08 | 6328 | |
| 0.09 | 4917 | 1.9% |
| Value | Count | Frequency (%) |
| 102.4 | 1 | |
| 102.28 | 1 | |
| 102.13 | 1 | |
| 102.05 | 1 | |
| 102.04 | 1 | |
| 102 | 2 | |
| 101.85 | 1 | |
| 101.76 | 1 | |
| 101.74 | 1 | |
| 101.73 | 1 |
| Distinct | 427 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.06022642255 |
| Minimum | 0 |
|---|---|
| Maximum | 12.07 |
| Zeros | 82421 |
| Zeros (%) | 32.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.9 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0.01 |
| Q3 | 0.04 |
| 95-th percentile | 0.26 |
| Maximum | 12.07 |
| Range | 12.07 |
| Interquartile range (IQR) | 0.04 |
Descriptive statistics
| Standard deviation | 0.2055094313 |
|---|---|
| Coefficient of variation (CV) | 3.412280235 |
| Kurtosis | 532.6483432 |
| Mean | 0.06022642255 |
| Median Absolute Deviation (MAD) | 0.01 |
| Skewness | 16.52111212 |
| Sum | 15246.62 |
| Variance | 0.04223412635 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 82421 | |
| 0.01 | 53975 | |
| 0.02 | 27091 | 10.7% |
| 0.03 | 16825 | 6.6% |
| 0.04 | 11402 | 4.5% |
| 0.05 | 8327 | 3.3% |
| 0.06 | 6310 | 2.5% |
| 0.07 | 5057 | 2.0% |
| 0.08 | 4101 | 1.6% |
| 0.09 | 3391 | 1.3% |
| Other values (417) | 34255 |
| Value | Count | Frequency (%) |
| 0 | 82421 | |
| 0.01 | 53975 | |
| 0.02 | 27091 | 10.7% |
| 0.03 | 16825 | 6.6% |
| 0.04 | 11402 | 4.5% |
| 0.05 | 8327 | 3.3% |
| 0.06 | 6310 | 2.5% |
| 0.07 | 5057 | 2.0% |
| 0.08 | 4101 | 1.6% |
| 0.09 | 3391 | 1.3% |
| Value | Count | Frequency (%) |
| 12.07 | 1 | |
| 11.79 | 1 | |
| 11.47 | 1 | |
| 10.99 | 1 | |
| 10.94 | 1 | |
| 10.81 | 1 | |
| 9.96 | 1 | |
| 9.9 | 1 | |
| 9.86 | 1 | |
| 9.7 | 1 |
| Distinct | 1568 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.3886901701 |
| Minimum | 0 |
|---|---|
| Maximum | 100 |
| Zeros | 27697 |
| Zeros (%) | 10.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.9 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0.02 |
| median | 0.09 |
| Q3 | 0.3 |
| 95-th percentile | 1.48 |
| Maximum | 100 |
| Range | 100 |
| Interquartile range (IQR) | 0.28 |
Descriptive statistics
| Standard deviation | 1.498642015 |
|---|---|
| Coefficient of variation (CV) | 3.855621084 |
| Kurtosis | 756.8612729 |
| Mean | 0.3886901701 |
| Median Absolute Deviation (MAD) | 0.08 |
| Skewness | 20.90775832 |
| Sum | 98398.86 |
| Variance | 2.245927888 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 27697 | 10.9% |
| 0.01 | 22308 | 8.8% |
| 0.02 | 15949 | 6.3% |
| 0.03 | 13980 | 5.5% |
| 0.04 | 13738 | 5.4% |
| 0.05 | 9217 | 3.6% |
| 0.07 | 7528 | 3.0% |
| 0.08 | 7426 | 2.9% |
| 0.06 | 7062 | 2.8% |
| 0.09 | 6153 | 2.4% |
| Other values (1558) | 122097 |
| Value | Count | Frequency (%) |
| 0 | 27697 | |
| 0.01 | 22308 | |
| 0.02 | 15949 | |
| 0.03 | 13980 | |
| 0.04 | 13738 | |
| 0.05 | 9217 | 3.6% |
| 0.06 | 7062 | 2.8% |
| 0.07 | 7528 | 3.0% |
| 0.08 | 7426 | 2.9% |
| 0.09 | 6153 | 2.4% |
| Value | Count | Frequency (%) |
| 100 | 3 | |
| 90.9 | 1 | < 0.1% |
| 87.29 | 1 | < 0.1% |
| 75.01 | 1 | < 0.1% |
| 70.46 | 1 | < 0.1% |
| 68.88 | 1 | < 0.1% |
| 66.15 | 1 | < 0.1% |
| 65.64 | 1 | < 0.1% |
| 65.19 | 1 | < 0.1% |
| 62.84 | 1 | < 0.1% |
| Distinct | 437 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.1000497324 |
| Minimum | 0 |
|---|---|
| Maximum | 12.03 |
| Zeros | 21173 |
| Zeros (%) | 8.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.9 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0.02 |
| median | 0.04 |
| Q3 | 0.1 |
| 95-th percentile | 0.36 |
| Maximum | 12.03 |
| Range | 12.03 |
| Interquartile range (IQR) | 0.08 |
Descriptive statistics
| Standard deviation | 0.2259983061 |
|---|---|
| Coefficient of variation (CV) | 2.258859676 |
| Kurtosis | 377.6456646 |
| Mean | 0.1000497324 |
| Median Absolute Deviation (MAD) | 0.03 |
| Skewness | 13.48981139 |
| Sum | 25328.09 |
| Variance | 0.05107523434 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.01 | 37104 | |
| 0.02 | 29307 | |
| 0.03 | 23172 | 9.2% |
| 0 | 21173 | 8.4% |
| 0.04 | 18815 | 7.4% |
| 0.05 | 15443 | 6.1% |
| 0.06 | 12720 | 5.0% |
| 0.07 | 10463 | 4.1% |
| 0.08 | 8999 | 3.6% |
| 0.09 | 7649 | 3.0% |
| Other values (427) | 68310 |
| Value | Count | Frequency (%) |
| 0 | 21173 | |
| 0.01 | 37104 | |
| 0.02 | 29307 | |
| 0.03 | 23172 | |
| 0.04 | 18815 | |
| 0.05 | 15443 | |
| 0.06 | 12720 | 5.0% |
| 0.07 | 10463 | 4.1% |
| 0.08 | 8999 | 3.6% |
| 0.09 | 7649 | 3.0% |
| Value | Count | Frequency (%) |
| 12.03 | 1 | |
| 11.72 | 1 | |
| 11.54 | 1 | |
| 11.01 | 1 | |
| 10.96 | 1 | |
| 10.8 | 1 | |
| 10.05 | 1 | |
| 10.01 | 1 | |
| 9.99 | 1 | |
| 9.8 | 1 |
| Distinct | 1809 |
|---|---|
| Distinct (%) | 0.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.6426482195 |
| Minimum | 0 |
|---|---|
| Maximum | 102.23 |
| Zeros | 496 |
| Zeros (%) | 0.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.9 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.02 |
| Q1 | 0.09 |
| median | 0.24 |
| Q3 | 0.6 |
| 95-th percentile | 2.34 |
| Maximum | 102.23 |
| Range | 102.23 |
| Interquartile range (IQR) | 0.51 |
Descriptive statistics
| Standard deviation | 1.796624623 |
|---|---|
| Coefficient of variation (CV) | 2.795657979 |
| Kurtosis | 498.726249 |
| Mean | 0.6426482195 |
| Median Absolute Deviation (MAD) | 0.18 |
| Skewness | 16.73239402 |
| Sum | 162689.61 |
| Variance | 3.227860035 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.01 | 10143 | 4.0% |
| 0.03 | 8182 | 3.2% |
| 0.02 | 8031 | 3.2% |
| 0.07 | 7156 | 2.8% |
| 0.04 | 6997 | 2.8% |
| 0.05 | 6871 | 2.7% |
| 0.09 | 6046 | 2.4% |
| 0.06 | 5828 | 2.3% |
| 0.08 | 5461 | 2.2% |
| 0.12 | 5431 | 2.1% |
| Other values (1799) | 183009 |
| Value | Count | Frequency (%) |
| 0 | 496 | 0.2% |
| 0.01 | 10143 | |
| 0.02 | 8031 | |
| 0.03 | 8182 | |
| 0.04 | 6997 | |
| 0.05 | 6871 | |
| 0.06 | 5828 | |
| 0.07 | 7156 | |
| 0.08 | 5461 | |
| 0.09 | 6046 |
| Value | Count | Frequency (%) |
| 102.23 | 1 | |
| 100 | 1 | |
| 99.15 | 1 | |
| 94.59 | 1 | |
| 93.3 | 1 | |
| 90.95 | 1 | |
| 81.7 | 1 | |
| 77 | 1 | |
| 73.39 | 1 | |
| 71.03 | 1 |
| Distinct | 1741 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 26509 |
| Missing (%) | 10.5% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -0.08994246534 |
| Minimum | -62.92 |
|---|---|
| Maximum | 62.27 |
| Zeros | 10216 |
| Zeros (%) | 4.0% |
| Negative | 128371 |
| Negative (%) | 50.7% |
| Memory size | 1.9 MiB |
Quantile statistics
| Minimum | -62.92 |
|---|---|
| 5-th percentile | -0.81 |
| Q1 | -0.14 |
| median | -0.02 |
| Q3 | 0.05 |
| 95-th percentile | 0.45 |
| Maximum | 62.27 |
| Range | 125.19 |
| Interquartile range (IQR) | 0.19 |
Descriptive statistics
| Standard deviation | 0.9898825882 |
|---|---|
| Coefficient of variation (CV) | -11.0057311 |
| Kurtosis | 599.9457333 |
| Mean | -0.08994246534 |
| Median Absolute Deviation (MAD) | 0.09 |
| Skewness | -5.484058496 |
| Sum | -20385.1 |
| Variance | 0.9798675384 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| -0.01 | 10740 | 4.2% |
| 0 | 10216 | 4.0% |
| 0.01 | 9650 | 3.8% |
| -0.02 | 9011 | 3.6% |
| -0.03 | 7427 | 2.9% |
| 0.02 | 7423 | 2.9% |
| -0.04 | 6514 | 2.6% |
| 0.03 | 6261 | 2.5% |
| -0.05 | 5740 | 2.3% |
| 0.04 | 5465 | 2.2% |
| Other values (1731) | 148199 | |
| (Missing) | 26509 | 10.5% |
| Value | Count | Frequency (%) |
| -62.92 | 1 | |
| -54.6 | 1 | |
| -47.15 | 1 | |
| -47.02 | 1 | |
| -43.26 | 1 | |
| -40.62 | 1 | |
| -40.27 | 1 | |
| -39.57 | 1 | |
| -38.32 | 1 | |
| -36.87 | 1 |
| Value | Count | Frequency (%) |
| 62.27 | 1 | |
| 50.87 | 1 | |
| 47.35 | 1 | |
| 37.8 | 1 | |
| 35.7 | 1 | |
| 34.56 | 1 | |
| 33.57 | 1 | |
| 32.71 | 1 | |
| 29.75 | 1 | |
| 29.29 | 1 |
| Distinct | 18047 |
|---|---|
| Distinct (%) | 8.0% |
| Missing | 26509 |
| Missing (%) | 10.5% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7484.957849 |
| Minimum | 0.01 |
|---|---|
| Maximum | 17450 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.9 MiB |
Quantile statistics
| Minimum | 0.01 |
|---|---|
| 5-th percentile | 0.67 |
| Q1 | 3001 |
| median | 7588 |
| Q3 | 11578 |
| 95-th percentile | 15465.75 |
| Maximum | 17450 |
| Range | 17449.99 |
| Interquartile range (IQR) | 8577 |
Descriptive statistics
| Standard deviation | 4918.986577 |
|---|---|
| Coefficient of variation (CV) | 0.6571829362 |
| Kurtosis | -1.165867871 |
| Mean | 7484.957849 |
| Median Absolute Deviation (MAD) | 4266 |
| Skewness | 0.04967723568 |
| Sum | 1696435757 |
| Variance | 24196428.94 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.11 | 569 | 0.2% |
| 0.01 | 494 | 0.2% |
| 0.14 | 485 | 0.2% |
| 0.02 | 404 | 0.2% |
| 0.03 | 383 | 0.2% |
| 0.05 | 349 | 0.1% |
| 0.06 | 342 | 0.1% |
| 0.04 | 337 | 0.1% |
| 0.07 | 329 | 0.1% |
| 0.1 | 321 | 0.1% |
| Other values (18037) | 222633 | |
| (Missing) | 26509 | 10.5% |
| Value | Count | Frequency (%) |
| 0.01 | 494 | |
| 0.02 | 404 | |
| 0.03 | 383 | |
| 0.04 | 337 | |
| 0.05 | 349 | |
| 0.06 | 342 | |
| 0.07 | 329 | |
| 0.08 | 308 | |
| 0.09 | 260 | |
| 0.1 | 321 |
| Value | Count | Frequency (%) |
| 17450 | 2 | |
| 17449 | 2 | |
| 17448 | 2 | |
| 17447 | 2 | |
| 17446 | 2 | |
| 17445 | 2 | |
| 17444 | 2 | |
| 17443 | 2 | |
| 17442 | 2 | |
| 17441 | 2 |
| Distinct | 1727 |
|---|---|
| Distinct (%) | 0.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.9 MiB |
| 다다09b01a | 5097 |
|---|---|
| 다나06a99b | 4224 |
| 다나12b73a | 3864 |
| 다다06a00a | 3640 |
| 다나12b73b | 3610 |
| Other values (1722) |
Length
| Max length | 8 |
|---|---|
| Median length | 8 |
| Mean length | 7.993695562 |
| Min length | 1 |
Characters and Unicode
| Total characters | 2023644 |
|---|---|
| Distinct characters | 14 |
| Distinct categories | 3 ? |
| Distinct scripts | 3 ? |
| Distinct blocks | 2 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 21 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | 다나05a99b |
|---|---|
| 2nd row | 다나05a99b |
| 3rd row | 다나05a99b |
| 4th row | 다나05a99b |
| 5th row | 다나05a99b |
Common Values
| Value | Count | Frequency (%) |
| 다다09b01a | 5097 | 2.0% |
| 다나06a99b | 4224 | 1.7% |
| 다나12b73a | 3864 | 1.5% |
| 다다06a00a | 3640 | 1.4% |
| 다나12b73b | 3610 | 1.4% |
| 다다05a00a | 3457 | 1.4% |
| 다나05b99b | 3018 | 1.2% |
| 다다09b02b | 2710 | 1.1% |
| 다다11a01b | 2633 | 1.0% |
| 다다06b00a | 2566 | 1.0% |
| Other values (1717) | 218336 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 다다09b01a | 5097 | 2.0% |
| 다나06a99b | 4224 | 1.7% |
| 다나12b73a | 3864 | 1.5% |
| 다다06a00a | 3640 | 1.4% |
| 다나12b73b | 3610 | 1.4% |
| 다다05a00a | 3457 | 1.4% |
| 다나05b99b | 3018 | 1.2% |
| 다다09b02b | 2710 | 1.1% |
| 다다11a01b | 2633 | 1.0% |
| 다다06b00a | 2566 | 1.0% |
| Other values (1717) | 218336 |
Most occurring characters
| Value | Count | Frequency (%) |
| 다 | 314667 | |
| 0 | 280139 | |
| b | 256072 | |
| a | 249782 | |
| 나 | 191187 | |
| 9 | 151969 | |
| 1 | 124448 | 6.1% |
| 7 | 97535 | 4.8% |
| 8 | 75547 | 3.7% |
| 3 | 74220 | 3.7% |
| Other values (4) | 208078 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 1011936 | |
| Other Letter | 505854 | |
| Lowercase Letter | 505854 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 280139 | |
| 9 | 151969 | |
| 1 | 124448 | |
| 7 | 97535 | 9.6% |
| 8 | 75547 | 7.5% |
| 3 | 74220 | 7.3% |
| 2 | 69697 | 6.9% |
| 4 | 53634 | 5.3% |
| 5 | 48724 | 4.8% |
| 6 | 36023 | 3.6% |
Other Letter
| Value | Count | Frequency (%) |
| 다 | 314667 | |
| 나 | 191187 |
Lowercase Letter
| Value | Count | Frequency (%) |
| b | 256072 | |
| a | 249782 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 1011936 | |
| Hangul | 505854 | |
| Latin | 505854 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 280139 | |
| 9 | 151969 | |
| 1 | 124448 | |
| 7 | 97535 | 9.6% |
| 8 | 75547 | 7.5% |
| 3 | 74220 | 7.3% |
| 2 | 69697 | 6.9% |
| 4 | 53634 | 5.3% |
| 5 | 48724 | 4.8% |
| 6 | 36023 | 3.6% |
Hangul
| Value | Count | Frequency (%) |
| 다 | 314667 | |
| 나 | 191187 |
Latin
| Value | Count | Frequency (%) |
| b | 256072 | |
| a | 249782 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1517790 | |
| Hangul | 505854 | 25.0% |
Most frequent character per block
Hangul
| Value | Count | Frequency (%) |
| 다 | 314667 | |
| 나 | 191187 |
ASCII
| Value | Count | Frequency (%) |
| 0 | 280139 | |
| b | 256072 | |
| a | 249782 | |
| 9 | 151969 | |
| 1 | 124448 | |
| 7 | 97535 | 6.4% |
| 8 | 75547 | 5.0% |
| 3 | 74220 | 4.9% |
| 2 | 69697 | 4.6% |
| 4 | 53634 | 3.5% |
| Other values (2) | 84747 | 5.6% |
Auto
The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.
First rows
| Date | Store_name | si_gune_gu | Dong | loc | cate1 | cate2 | jeju_person_sales | jeju_person_sales_num | other_person_sales | other_person_sales_num | tot_sales | tot_sales_num | change | ranking | ID | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 202101 | 한국맥도날드(유)제주노형점 | 제주시 | 노형동 | 제주시 동지역 | 패스트푸드 | 햄버거 | 29.36 | 98.18 | 0.90 | 27.71 | 4.68 | 102.23 | NaN | NaN | 다나05a99b |
| 1 | 202101 | 버거킹제주이마트점 | 제주시 | 노형동 | 제주시 동지역 | 패스트푸드 | 햄버거 | 9.60 | 29.89 | 0.39 | 12.12 | 1.63 | 33.69 | NaN | NaN | 다나05a99b |
| 2 | 202101 | 투썸플레이스 제주노형오거리점 | 제주시 | 노형동 | 제주시 동지역 | 음료 | 커피 | 3.49 | 11.14 | 0.14 | 4.31 | 0.59 | 12.40 | NaN | NaN | 다나05a99b |
| 3 | 202101 | 뚜레쥬르 노형오거리점 | 제주시 | 노형동 | 제주시 동지역 | 간식 | 베이커리 | 3.40 | 9.41 | 0.08 | 1.79 | 0.52 | 9.20 | NaN | NaN | 다나05a99b |
| 4 | 202101 | 에이바우트커피아이파크점 | 제주시 | 노형동 | 제주시 동지역 | 음료 | 커피 | 1.37 | 8.91 | 0.04 | 2.10 | 0.21 | 8.98 | NaN | NaN | 다나05a99b |
| 5 | 202101 | 던킨도너츠신제주점 | 제주시 | 노형동 | 제주시 동지역 | 간식 | 도너츠 | 2.66 | 8.44 | 0.04 | 1.60 | 0.38 | 8.24 | NaN | NaN | 다나05a99b |
| 6 | 202101 | 주식회사 파스쿠찌제주노형로터리점 | 제주시 | 노형동 | 제주시 동지역 | 음료 | 커피 | 2.17 | 8.22 | 0.03 | 0.92 | 0.31 | 7.58 | NaN | NaN | 다나05a99b |
| 7 | 202101 | (유)아웃백스테이크하우스코리아제주점 | 제주시 | 노형동 | 제주시 동지역 | 양식 | 패밀리 레스토랑 | 19.13 | 7.97 | 0.67 | 2.59 | 3.14 | 8.53 | NaN | NaN | 다나05a99b |
| 8 | 202101 | 더치앤빈 노형점 | 제주시 | 노형동 | 제주시 동지역 | 음료 | 커피 | 1.29 | 7.43 | 0.04 | 2.14 | 0.21 | 7.77 | NaN | NaN | 다나05a99b |
| 9 | 202101 | 타이거커피 제주점 | 제주시 | 노형동 | 제주시 동지역 | 한식 | 단품요리 전문 | 0.99 | 6.59 | 0.01 | 1.03 | 0.14 | 6.28 | NaN | NaN | 다나05a99b |
Last rows
| Date | Store_name | si_gune_gu | Dong | loc | cate1 | cate2 | jeju_person_sales | jeju_person_sales_num | other_person_sales | other_person_sales_num | tot_sales | tot_sales_num | change | ranking | ID | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 253145 | 202207 | 경성주막1929삼화점 | 제주시 | 화북동 | 제주시 동지역 | 한식 | 가정식 | 0.30 | 0.45 | 0.0 | 0.0 | 0.02 | 0.08 | 0.03 | 8665.0 | 다다14a02b |
| 253146 | 202207 | 현아식당 | 제주시 | 화북동 | 제주시 동지역 | 한식 | 가정식 | 0.04 | 0.28 | 0.0 | 0.0 | 0.00 | 0.05 | 0.12 | 3559.0 | 다다13a02b |
| 253147 | 202207 | 서로푸드 | 제주시 | 화북동 | 제주시 동지역 | 한식 | 단품요리 전문 | 0.04 | 0.13 | 0.0 | 0.0 | 0.00 | 0.02 | 0.00 | 11660.0 | 다다14a03a |
| 253148 | 202207 | 투다리한라점 | 제주시 | 화북동 | 제주시 동지역 | 주점및주류판매 | 꼬치구이 | 0.39 | 0.55 | 0.0 | 0.0 | 0.03 | 0.10 | 0.11 | 4091.0 | 다다13a03b |
| 253149 | 202207 | 뱃사공 | 제주시 | 화북동 | 제주시 동지역 | 한식 | 단품요리 전문 | 0.56 | 0.25 | 0.0 | 0.0 | 0.04 | 0.04 | 0.04 | 7528.0 | 다다13a03b |
| 253150 | 202207 | 남문회센타 | 제주시 | 화북동 | 제주시 동지역 | 한식 | 회 | 0.34 | 0.28 | 0.0 | 0.0 | 0.02 | 0.05 | 0.04 | 7757.0 | 다다13a03a |
| 253151 | 202207 | 황금성반점 | 제주시 | 화북동 | 제주시 동지역 | 아시아음식 | 중식 | 0.56 | 1.36 | 0.0 | 0.0 | 0.04 | 0.24 | 0.03 | 7906.0 | 다다12a02b |
| 253152 | 202207 | 남문두루치기 | 제주시 | 화북동 | 제주시 동지역 | 한식 | 단품요리 전문 | 0.16 | 0.30 | 0.0 | 0.0 | 0.01 | 0.06 | 0.02 | 9099.0 | 다다13a03b |
| 253153 | 202207 | 갈비정식 | 제주시 | 화북동 | 제주시 동지역 | 한식 | 갈비 | 0.18 | 0.23 | 0.0 | 0.0 | 0.01 | 0.04 | 0.05 | 6841.0 | 다다13a03a |
| 253154 | 202207 | 불타는여고24시떡볶이삼화점 | 제주시 | 화북동 | 제주시 동지역 | 간식 | 분식 | 0.18 | 0.43 | 0.0 | 0.0 | 0.01 | 0.07 | 0.05 | 7094.0 | 다다14a03a |
Most frequently occurring
| Date | Store_name | si_gune_gu | Dong | loc | cate1 | cate2 | jeju_person_sales | jeju_person_sales_num | other_person_sales | other_person_sales_num | tot_sales | tot_sales_num | change | ranking | ID | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 202110 | 금정아트 민화공방 | 제주시 | 일도1동 | 제주시 동지역 | 간식 | 베이커리 | 0.02 | 0.03 | 0.00 | 0.00 | 0.00 | 0.01 | 0.00 | 4141.0 | 다다09a02b | 2 |
| 1 | 202110 | 칼맛 | 서귀포시 | 대륜동 | 서귀포시 동지역 | 아시아음식 | 일식 | 1.09 | 0.38 | 0.03 | 0.08 | 0.09 | 0.17 | -0.28 | 15109.0 | 다나08a74a | 2 |
| 2 | 202111 | 1월구름 | 서귀포시 | 안덕면 | 안덕면 | 간식 | 베이커리 | 0.02 | 0.06 | 0.00 | 0.01 | 0.00 | 0.02 | -0.04 | 8195.0 | 나나91a73a | 2 |
| 3 | 202111 | 올푸드 | 서귀포시 | 정방동 | 서귀포시 동지역 | 패스트푸드 | 피자 | 0.02 | 0.03 | 0.00 | 0.00 | 0.00 | 0.01 | -0.01 | 6120.0 | 다나13a73a | 2 |
| 4 | 202111 | 칼맛 | 서귀포시 | 대륜동 | 서귀포시 동지역 | 아시아음식 | 일식 | 0.40 | 0.10 | 0.00 | 0.01 | 0.03 | 0.04 | -0.02 | 7179.0 | 다나08a74a | 2 |
| 5 | 202111 | 패자부활전 | 제주시 | 이도2동 | 제주시 동지역 | 한식 | 단품요리 전문 | 0.08 | 0.09 | 0.00 | 0.01 | 0.01 | 0.04 | -0.05 | 8766.0 | 다다09b01a | 2 |